-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor to use wrapper function to wrap all the codes in cast_string_to_function.h #2261
Conversation
f36dd93
to
0baad2e
Compare
5f3a5ef
to
eb65ca0
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #2261 +/- ##
========================================
Coverage 89.60% 89.61%
========================================
Files 1016 1024 +8
Lines 35808 36031 +223
========================================
+ Hits 32085 32288 +203
- Misses 3723 3743 +20
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My general comment is that instead of keeping different stringCast operation, we should have a generic stringCast operation class and information should be passed in as BindData.
bool BaseCSVReader::TryCastVector(Vector &parse_chunk_col, idx_t size, const LogicalType &sql_type) {
// try vector-cast from string to sql_type
Vector dummy_result(sql_type);
if (options.has_format[LogicalTypeId::DATE] && sql_type == LogicalTypeId::DATE) {
// use the date format to cast the chunk
string error_message;
idx_t line_error;
return TryCastDateVector(options, parse_chunk_col, dummy_result, size, error_message, line_error);
} else if (options.has_format[LogicalTypeId::TIMESTAMP] && sql_type == LogicalTypeId::TIMESTAMP) {
// use the timestamp format to cast the chunk
string error_message;
return TryCastTimestampVector(options, parse_chunk_col, dummy_result, size, error_message);
} else {
// target type is not varchar: perform a cast
string error_message;
return VectorOperations::DefaultTryCast(parse_chunk_col, dummy_result, size, &error_message, true);
}
}
This is how duckdb implements its casting function.
|
||
// nested types | ||
template<typename T> | ||
static void operation(const char* input, uint64_t len, common::ValueVector* vector, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our ultimate goal is to only keep one string cast operation. CSVReaderConfig should be passed as bindData, and rowToAdd should be replaced with dstValue. You can take a look at how duckdb achieves this, they have a class called BoundCastInfo. You can change this in later PRs.
src/include/function/cast/functions/cast_string_non_nested_functions.h
Outdated
Show resolved
Hide resolved
src/include/function/cast/functions/cast_string_non_nested_functions.h
Outdated
Show resolved
Hide resolved
0a39452
to
f5f6199
Compare
throw common::OverflowException{common::stringFormat( | ||
"Value {} is not within INT64 range", common::TypeUtils::toString(input).c_str())}; | ||
}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is covered by test
template<> | ||
inline void CastStringToTypes::operation(const char* input, uint64_t len, bool& result) { | ||
castStringToBool(input, len, result); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After checking gdb, this line is covered
d5ebb51
to
21ea21c
Compare
} | ||
|
||
template<typename T> | ||
static void operation(const char* input, uint64_t len, T& result, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will combine into 1 function after implementing casting const char* to ku_string_t
cast_string_to_function.h
inline void CastToDouble::operation(common::int128_t& input, double_t& result) { | ||
if (!common::Int128_t::tryCast(input, result)) { // LCOV_EXCL_START | ||
throw common::OverflowException{common::stringFormat( | ||
"Value {} is not within DOUBLE range", common::TypeUtils::toString(input).c_str())}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this excluded from coverage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int128 to double can't trigger overflow since int128 is always in double range
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
INT128 is always in double range
is not true.
Large integer numbers in double are not consecutive because of the precision. For example: 2^100 may be representable in double but 2^100 + 1 cannot be represented in double. I am not sure what is the expected behaviour in this case. Maybe check duckdb's solution. I am sure that c++ just silently loose precision in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Converting to double and losing precision is not surprising. Postgres says that the number stored may be inexact and to use numeric for exactness.
Let's add a comment for why this is unreachable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duckdb always returns true (with losing precision) and so for us
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
│ CAST(CAST('170141183460469231731687303715884105727' AS HUGEINT) AS DOUBLE) │ |
---|
│ double |
│ 1.7014118346046923e+38 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point Ziyi is making is that if you add 1 to the integer there, the actual value stored will likely be the same because doubles are not arbitrary precision.
operation
warpper